Why are neural networks sometimes much more accurate than decision trees: an analysis on a bio-informatics problem

نویسندگان

  • Lawrence O. Hall
  • Xiaomei Liu
  • Kevin W. Bowyer
  • Robert E. Banfield
چکیده

Bio-informatics data sets may be large in the number of examples and/or the number of features. Predicting the secondary structure of proteins from amino acid sequences is one example of high dimensional data for which large training sets exist. The data from the KDD Cup 2001 on the binding of compounds to thrombin is another example of a very high dimensional data set. This type of data set can require significant computing resources to train a neural network. In general, decision trees will require much less training time than neural networks. There have been a number of studies on the advantages of decision trees relative to neural networks for specific data sets. There are often statistically significant, though typically not very large, differences. Here, we examine one case in which a neural network greatly outperforms a decision tree; predicting the secondary structure of proteins. The hypothesis that the neural network learns important features of the data through its hidden units is explored by a using a neural network to transform data for decision tree training. Experiments show that this explains some of the performance difference, but not all. Ensembles of decision trees are compared with a single neural network. It is our conclusion that the problem of protein secondary structure prediction exhibits some characteristics that are fundamentally better exploited by a neural network model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Popular Ensemble Methods: An Empirical Study

An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund & Schapire, 1996; Schapire, 1990) are two relatively new...

متن کامل

Knowledge Extraction From Trained Neural Networks

Received Jul 16 th , 2012 Revised Aug 01 th , 2012 Accepted Sept 02 th , 2012 Artificial neural networks (ANN) are very efficient in solving various kinds of problems But Lack of explanation capability (Black box nature of Neural Networks) is one of the most important reasons why artificial neural networks do not get necessary interest in some parts of industry. In this work artificial neural n...

متن کامل

A DSS-Based Dynamic Programming for Finding Optimal Markets Using Neural Networks and Pricing

One of the substantial challenges in marketing efforts is determining optimal markets, specifically in market segmentation. The problem is more controversial in electronic commerce and electronic marketing. Consumer behaviour is influenced by different factors and thus varies in different time periods. These dynamic impacts lead to the uncertain behaviour of consumers and therefore harden the t...

متن کامل

Correction of Regression Predictions Using the Secondary Learner on the Sensitivity Analysis Outputs

For a given regression model, each individual prediction may be more or less accurate. The average accuracy of the system cannot provide the error estimate for a single particular prediction, which could be used to correct the prediction to a more accurate value. We propose a method for correction of the regression predictions that is based on the sensitivity analysis approach. Using prediction...

متن کامل

Is it worth generating rules from neural network ensembles?

Although many authors generated comprehensible models from individual networks, much less work has been done in the explanation of ensembles. DIMLP is a special neural network model from which rules are generated at the level of a single network and also at the level of an ensemble of networks. We applied ensembles of 25 DIMLP networks to several datasets of the public domain and a classificati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003